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ABSTRACT 

Over  the  past  several  years  benchmarking  has  been  devel- 
oped into  an  effective  technique  for  performance  analyses  of 
computer   systems.  Relational   database   machines      are    rela- 

tively new  compter  systems  for  which  a  benchmarking  tech- 
nique  does   not    yet    exist;. 

The  benchmarking  of  relational  database  machines 
involves  the  indent ification  and  design  of  test  programs 
through  which  relevant  performance  data  can  be  gathered  and 
interpreted.  All  features  of  relational  database  management 
must  be  considered  when  designing  these  test  programs.  The 
join  operations  are  an  important  feature  of  relational  data- 
base   management. 

The  test  programs  for  the  join  operations  necessarily 
include  the  repetition  of  certain  queries  during  which 
specific      join      parameters    are      varied.  These      parameters 

include:  tuple  size,  relation  size,  disk  placement,  and  the 
use      cf    indices.  A  number      of    join      operations    have      been 

benchmarked.  These       operations      are        equality        joins, 

inequality  joins,  three-way  joins,  and  virtual  joins  (i.e., 
views) .  In  addition,        a      number      of   relational      database 

machine  configurations  have  been  utilized  for  benchmarking 
the    join   operations. 

The  highlights  of  the  thesis  can  be  found  in  its  contri- 
bution tc  a  benchmarking  technique  for  the  join  operations 
and  its  conclusions  on  the  performance  analyses  of  various 
relational   machines    in  operating    joins. 
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I-    INTRODUCTION 

A.  WHAT    IS    BENCHMARKING? 

The  term  "benchmark"  has  its  origin  in  the  field  of 
geographical     surveying.  A      benchmark      is        a      permanent 

geographic  feature  which  serves  as  a  landmark  for  surveying. 
The  term  has  evolved  into  defining  a  standard  or  criterion 
associated  with  a  particular  type  of  system  or  product. 
This  standard  serves  as  a  point  of  reference  to  which  func- 
tionally  similar   systems   or    products    can   be   compared. 

In  the  realm  of  computer  science  a  benchmark  consists  of 
a  standard  set  of  instructions  or  programs.  The  execution  of 
the  set  en  one  system  provides  measurements  that  can  be  used 
to  compare  with  measurements  obtained  by  running  the  same 
set    on      another    system.  This   is      the    essence      of   computer 

system  benchmarking:  the  process  of  conducting  controlled 
experiments  to  collect  indicators  of  comparative  performance 
cf   different  computer  systems. 

B.  THE    "GIBSON    MIX" 

Comparisons  of  computer  systems  were  prompted  by  the 
increasing  application  of  the  systems  in  business  and  other 
situations  in  a  cost-effective  way.  This  interest  in  compa- 
rative performance  of  systems  had  resulted  in  the  controlled 
experiments  cf  the  systems.  In  1970,  J.C.  Gibson  introduced 
a  system  of  programs  sets  or  "mixes"  by  which  variable  types 
cf    wcrklcads  could    be   compared.  The    "Gibson   Mix"    approach 

to  comparing  systems  is  based  on  testing  several  se  +  s  of 
applications  in      both   business      and   science.  The    results, 

execution        times,       cf      these      tests      were    published.  The 

problem   of   selecting      a    particular  computer   system      could   be 
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reduced  tc  establishing  workloads  as  multiples  of  the  mix. 
By  properly  balancing  execution  t»mes  and  mix  multiples, 
system  evaluators  could  produce  comparative  estimates  for 
total   computer    systems. 

C.       BENCHMARK    DESIGN    AND    OBJECTIVES 

Benchmarking  as  a  technique  for  comparisons  of  computer 
performance  has  enjoyed  increasing  popularity  over  the  past 
decade.  This  approach  is  appealing  both  to  producers  and 
consumers.  Basic  guidelines  have  been  developed  for  the 
proper  use  of  benchmarks.  The  benchmark  must  be  representa- 
tive cf  real-world  workloads,  and  the  mix  of  instructions 
should  be  inclusive  enough  to  provide  as  much  relevant  data 
as  possible.  Additionally,  the  relevance  of  benchmark 
content  must  be  justifiable.  The  benchmark  should  be  care- 
fully designed,  and  objectives  should  be  specifically  stated 
so  that  the  proper  sequence  of  steps  in  the  benchmark 
progression  can  be  set  down.  Objectives  may  include  evalua- 
tion towards  procurement,  design  analysis,  component  certi- 
fication, quality  determinations,  load  analysis,  improvement 
of  performance,  or  ether  objectives  as  determined  by  these 
requiring  the  benchmark.  The  benchmark  should  be  tailored 
to  the  objective  and  deal  with  those  demands  or  applications 
which  initially  formed  the  basis  of  and  the  requirement  for 
the    benchmarking.  The    benchmark      must    be      controlled    from 

design  through  implementation  and  throughout  the  interpreta- 
tion   of   the   results. 
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II.    THE    BENCHMARKING    ENVIRONMENT 

Tha  experiments  described  in  this  paper  have  been 
conducted  on  several  configurations  of  an  RDM  1100  at  the 
Data  Processing  Service  Center  West,  Naval  Air  Station, 
Point      Mugu,      California.  The    RDM      1100      and   its      various 

configurations  are  relational  database  machines,  each  of 
which  is  designed  to  be  the  backend  of  UNIVAC  1100  seri  =  s 
computers. 

A.  TEE    HCST  COMPOTES 

The  hcst  computer  system  of  which  a  relational  database 
machine  is  used  as  the  backend  is  the  UNIVAC  1100/42.  No 
modifications  have  been  required  of  the  UNIVAC  operating 
system.  Specially  designed  host-residsnt  software  has  been 
installed  in  the   UNIVAC. 

B.  THE    HCST   COMPUTES/DATABASE    MACHINE    INTERFACE 

Figure  2.1  depicts  the  presently  available  methods  for 
interfacing  between  the  host  and  the  backend.  The  first 
method,  the  relational  query  language,  is  a  command  inter- 
face. The  second  method  allows  the  user  to  execute  a  series 
of  queries  by  referring  to  a  set  of  stored  commands.  The 
third  method  is  via  user  programs  written  in  high-level 
programming  languages  such  as  COBOL  and  FORTRAN  in  which  a 
subroutine  is  provided  for  accessing  data  stored  in  the 
cacksnd  machine. 

In  the  RDM,  host  interfacing  is  accomplished  by  both 
parallel  and  serial  interface  modules  (processors)  (see 
Figure  2.2) .  Each  interface  module  can  support  up  to  8  host 
systems. 
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C.       THE    EENCHMAEKED    RELATIONAL    DATABASE    MACHINE 

The  tasic  relational  database  machine  on  which  the 
benchmarking  experiments  have  been  conducted  is  a  modularly 
designed,        microprocessor- based      database      computer.  The 

modules  are  organized  around  a  single  high-speed  bus  (see 
Figure    2.2   again). 

1 •      Technology    and  Functionality    of    Modules 

a.  The    Database   Processor 

This-Z8000  series  microprocessor  controls  the 
flow  cf  data  by  translating  user  queries  into  procedures. 
Additionally,  this  processor  supervises  system  resources, 
coordinates  hardware  monitoring,  and  performs  bus  arbitra- 
tion. The  processor  contains  approximately  99%  of  C-codes 
and      operates  at      1/2     MIP.  If   the      database      accelerator 

(described  below)  is  available,  the  database  processor 
senses   its   availability   and    issues   calls    for    its    services. 

b.  The    Accelerator 

This      high-speed,  auxilary      processor        which 

executes  instructions  at  10  MIPS  is  built  from  ECL  logic. 
It  has  a  three-stage  pipeline  and  is  designed  to  optimize  a 
well-defined  collection  cf  often  used  database  management 
subroutines.  The      accelerator      can    filter     data      at      disk 

transfer    rates. 

c.  The    Cache 

This  main  memory  is  composed  of  64K  dynamic  ram 
chips  and  is  expandable  up  to  6  megabytes.  System  informa- 
tion and  code  occupy  approximately  360K  cf  this  memory. 
Cache  is  allocated  in  2K  blocks,  contiguously  whenever 
possible.  The  paging  algorithm  is  basically 

Least-Recently-Used,    and  the  system   code   is   never    paged   out. 
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d.      Disk  Drives    and   the   Secondary    Storage 

The  disk  controller  module  performs  burst  error 
detection  and  correction  and  retry  without  intervention  by 
the  database  processor.  This  controller  can  manage  from  one 
to  four  disk  drives  with  each  drive  having  a  capacity  of  one 
to  four  disks.  Presently  there  ara  two  disks  available  with 
each  disk  capable  of  storing  approximately  600  megabytes  of 
data . 

2-      Different      Accelera  tor        and      Cache        Configurations 
Tested 

The    benchmarking   experiments    have      been    conducted  on 
the    following  different    machine  configurations; 

a.  1/2-megabyte         cache        without  the         database 
accelerator 

b.  2-megabytes   cache   with  the   database    accelerator 

c.  2-megabytes        cache  without        the  database 
accelerator 

D.       THE    DATABASES 

The  relational  database  machine  handles  data  in  2K  byte 
blocks.  With  this  in  mind  a  syntesized  database  has  been 
designe-3.  Tuple  (record)  lengths  of  100  bytes,  200  bytes, 
1000  bytes,  and  2000  bytes  have  been  chosen,  thereby 
providing  a  range  of  1  to  20  tuples  per  block.  It  has  been 
sought  through  experimentation  to  contrast  the  same  opera- 
tions performed  on  relations  with  different  numbers  of 
tuples  per  block.  It  is  felt  that  this  approach  may  provide 
some    measure  of    processor-overhead  time    versus  I/O   time. 
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1 .      Database   Generation 

Standard  templates  fcr  each  of  the  four  different 
tuple   lengths  have    been   designed.  Table    I   describes    these 

templates.  Note  that  within  each  template  there  are  attri- 
butes (fields)  that  are  common  to  all  four  templates: 
sequential  integers  and  random  integers.  The  attributes  of 
sequential  and  random  integers  can  be  used  to  enforce 
different  orderings  of  the  same  data.  Each  template  also 
contains  attributes  specified  with  values  uniformly  distri- 
buted over  a  number  of  enumerated  values.  3y  ensuring 
specified  distribution,  the  reliability  of  equality  joins 
can    be   assured. 

The  actual  relations  for  the  experimental  databases 
have  teen  generated  en  an  IBM  3033  system  in  batch  mode  and 
have  been  transferred  to  tape  for  transport  to  the  UNI  VAC 
system.  Fcr  each  of  the  four  tuple  lengths,  relations  have 
been    generated    with    500,    1000,    2500,    5000,    and    10000   tuples. 

2-      Pat  abas  e  Creation,    Loading,    and    Disk    Placement 

In  the  environment  cf  the  database  machine,  the 
number  cf  2K-byte  blocks  assigned  to  a  database  is  specified 
with  the  CREATE  DATABASE  command  in  the  query  language 
(Section  II. E  further  describes  the  query  language).  Since 
database  allocations  are  made  in  the  whole  number  of  cylin- 
ders, the  number  of  blocks  specified  will  be  rounded  up  to 
the  first  whole  number  of  cylinders.  Once  the  allocation  is 
made,  the  number  of  blocks  actually  allocated  is  returned  to 
the    user.  The   syntax   for      database   creation   in      the   query 

language    is: 

CREATE   DATABASE     (name)     WITH     (options) 
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options : 

demand    -   the    number    of   blocks    to   allocate 

disk  -    the    disk  on    which   allocation   is  desired 

For    example, 

CREATE    DATABASE   NPSTEST   with    demand    =    1000    on    "DSK001", 

demand  =   2  000   on   "DSKSYS" 

would  set  aside  1000  blocks  on  the  disk  named  "DSK001"  and 
2000  blocks  on  the  disk  named  "DSKSYS"  for  the-  database 
"NPSTEST". 

Once  the  database  has  been  assigned  disk  space, 
relations   in  that   database    may    be   created   as    follows: 

CREATE   relation-name      ({field   name)    =    (format),..., 
(field   name)    =    (format)) 

The  above  command  would  set  up  an  empty  relation  in  the 
database  to  which  tuples  could  then  be  appended.  A  database 
is    opened   by  simplying  entering:      "OPEN     (database    name)  ". 

In  order  to  bulkload  records  into  relations  in 
specified  databases,  utility  programs  have  been  provided. 
The  experimental  relations  that  have  been  generated  on  the 
IBM  3033  system  and  subsequently  loaded  into  the  UNIVAC 
system  have  been  translated  into  the  backend  machine  using 
these   utility   programs. 

Initially,  we  have  attempted  to  manipulate  the 
placement  of  relations  in  a  database.  That  is,  once  a  data- 
base has  been  allocated  with  disk  space  by  the  CREATE 
command,      w«=  have     tried    to    force   a   specific     placement   of  a 
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relation  on  a  particular  disk.  We  have  assumed  that  for  a 
join,  optimization  can  be  achieved  if  the  relations  tc  b9 
joined  are  physically  located  on  different  disks.  However, 
cur  attempts  at  placement  have  proven  futile.  The  designers 
of  the  tackend  machine  utilized  certain  placement  algo- 
rithms. These  algorithms  are  proprietary  and  are,  there- 
fore,   unavailable   for  our   modification. 

The  query  language  for  the  machine  allows  the  crea- 
tion of  indices  for  quicker  data  access.  The  creation  of 
these  indices  and  their  use  is  described  in  the  following 
section. 

3 .      Indices 

Simply  stated,  indices  are  designed  to  provide  more 
direct  access  to  stored  data.  The  query  language  for  the 
relational  database  machine  allows  for  the  creation  of  two 
different  types  of  indices.  A  "clustered"  index  is  one  for 
which  the  tuple  is  physically  in  the  order  of  ■'■he  value  in 
the  specified  fiald.  A  "nonclustered"  index  is  one  that  is 
created  for  a  field  or  group  of  fields  for  which  the  tuple 
is   not   clustered. 

Note  that  in  NPSTEST  all  of  the  relations  have  been 
created  with  clustered  indices.  Also,  as  they  are  described 
below,  indices  for  certain  relations  in  other  experimental 
databases  may  be  created,  destroyed,  and  then  recreated 
during  the  course  of  the  run  stream  for  a  particular  join 
experiment. 

4  •      li±   Ex  peri  mental   Databases 

Table  II  describes  the  experimental  databases.  As 
they  are  explained  more  fully  below  in  individual  experiment 
descriptions,  the  size  of  the  databases,  the  number  of  rela- 
tions in  the  databases,  and  the  indices  employed  are  all 
factors   in    the    measurements    obtained. 
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TABLE    II 
The    Experimental  Databases 
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E.       THE    CDEPY   LANGUAGE    FOE    THE    DATABASE    MACHINE 

Incorporated  as  an  integral  part  of  the  relational  data- 
base system  is  the  query  language  (RQL  in  the  case  of  the 
RDM    1100).  This    query    language    is      designed  to    be      both   a 

definition  language  and  manipulation  language  for  the  data 
stored   in  the  machine. 

1 .      Semantics  and  Syntax 

The  use  of  the  CREATE  command  for  both  databases 
and  relations  has  previously  been  discussed.  The  following 
discussion  seeks  to  describe  these  features  of  the  query 
language  that  are  essential  to  an  understanding  of  the 
nature  of  experiments  that  have  been  conducted  on  the  join 
operation. 

a.      EEGIN    (transaction    name) 

This  command  is  used  whenever  multiple  RQL  commands  are  to 
be   treated   as  a    single  command. 

t.      END    (transaction    name) 

This  command  is  used  at  the 'end  of  the  group  of  RQL  commands 
under   BEGIN. 

c.  CREATE    VIEW     (view    name) 

This  command  is  used  to  set  up  a  virtual  relation  within  a 
database. 

d.  DEFINE  (stored  command  name) 

This  command  is  used  to  define  a  stored  command  for  a  parti- 
cular database.  The  command  so  defined  can  be  referenced 
simply  by  its  name. 
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e.  DESTROY     (object    name) 

This  command  is  used  to  eliminate  databases,  relations, 
indices,  views,  stored  commands,  or  other  constructs  from 
the    system. 

f.  RANGE   of    (range    variable)     is    (relation    name) 

Range  variables  are  used  to  allow  the  user  tc  establish  a 
synonym  for  a  relation  name.  Once  this  synonym  is  estab- 
lished  it  can  be   used   in   lieu   of   the    relation   name. 

g.  RETRIEVE    (target    list)    tfHERE    (qualification) 

This  is  the  most  essential  command  for  performing  join  oper- 
ations. Relations  cr  portions  of  relations  are  pulled  from 
storage   and     displayed  for      the   user.  The   data      retrieved 

depends  en  the  user  supplied  qualification  which  may  include 
singular  or  multiple  equalities  and  inequalities.  Up  to  22 
fields   can   be  specified   in    the   target    list. 

h.      RETRIEVE    (variable   name   =    GETTIME    ()  ) 

GETTIME  is  a  function  in  RQL  that  allows  the  user  to 
retrieve   a      time   statement      from    the      RDM   clock.  The  time 

integer  retrieved  is  in  1/60  seconds.  As  will  be  described 
telow,    we   used    these    times    for   our   computations. 

2 .      The    Experimental   Queries 

Queries  have  teen  designed  utilizing  those  features 
of  RQL  described  above.  The  query  streams  have  been 
designed  as  sets  of  transactions,  and  the  joins  have  been 
designed  as  stored  commands  so  that  the  commands  could  be 
pre-parsed  in  order  that  parsing  time  would  be  eliminated 
from      the    join      time   measurements.  The      number    of      fields 

targeted  for  a  join  is  described  below  in  individual  experi- 
ment   descriptions. 
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III.    THE    BENCHMARKING 

A.  GOALS 

The  experiments  described  in  this  paper  are  directed 
towards  the  development  of  a  procedure  by  which  database 
machines   may      be   benchmarked.  The    efforts      described   here 

represent      only      a      portion    of      the      research.  Interested 

readers  are  directed  to  [Ref.  1],  [Ref.  2],  and  [Ref.  3]  for 
additional  research  en  selection  and  projection,  database 
administration,    and    database   generation. 

Ihe  goal  of  these  experiments  is  not  to  make  a  defini- 
tive pronouncement  en  the  performance  of  the  various 
configurations    of      the  RDM    1100.  Rather,      the    goal      is   to 

learn  how  to  design  benchmarks  and  interpret  the  results  of 
the  benchmarking  experiments.  Towards  this  end,  the  method- 
ology must  be  machine  independent,  and  the  workload  model 
must    be    based  on   a    mix  of   database  management   statements. 

B.  THE    METHODOLOGY 

The  workload  has  been  modeled  as  a  collection  of  queries 
in  the  relational  query  language  (RQL) .  The  primary  bench- 
mark kernel  for  the  join  operations  is  the  RETRIEVE  state- 
ment with  associated  qualifications.  In  designing  this 
workload,  classes  of  queries  have  been  identified.  These 
include  data-intensive  and  overhead-intensive  classes.  The 
workload  has  been  constructed  as  a  combination  of  queries 
from  each  class.  The  query  language  has  functioned  as  the 
primary  tool  for  performance  measurement  since  neither 
sorftwars  nor  hardware  probes  have  been  available  for  use  in 
conducting  these  experiments.  Using  the  functions  provided 
in    the   query     language,      elapsed   times   are      measureable   from 
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the  database  machine  clock  in  seconds.  Since  the  goal  of 
these  experiments  is  to  learn  the  effects  of  varying  parame- 
ters on  machine  performance  and  not  absolute  machine  perfor- 
mance,   this    "rough"    measurement   technigue    is    acceptable. 

The  operating  system  of  the  host  machine  allows  the  use 
cf  pre-defined  commands  and  queries  known  as  scripts  which 
has  eliminated  the  fluctuation  of  terminal  time. 
Additionally,  the  fluctuation  of  the  parse  time  has  been 
eliminated  by  using  pre-parsed  commands  stored  in  the  data- 
base. However,  seme  fluctuation  is  introduced  by  the 
guary-post  processor  which  formats  data  for  screen  display, 
but    this   is   not    significant    within  the   query   sets. 

The  initial  approach  to  defining  relevant  queries  has 
been  to  concentrate  en  the  repetition  of  certain  operations. 
During  this  repetition,  given  factors  have  been  varied  to 
ascertain  effects  on  performance.  For  the  join  operations, 
tuple  sizes,  database  sizes,  index  structure,  disk  place- 
ment,   and   machine  configuration   have    been   varied. 

By  and  large,  the  query  streams  have  been  run  in  a  cont- 
rolled environment.  To  offset  the  workload  variability  of 
the  host  machine,  runs  have  been  conducted  during  times  of 
minimal  activity  on  the  host.  Likewise,  use  of  the  database 
machine    has    been   restricted    to  a    single    user. 

C.       THE    JCIN   OPERATIONS 

Several  groups  of  experiments  have  been  conducted  during 
which  certain  parameters  have  been  varied  during  repetitions 
of      the      same      experiment.  These      experiments      have      been 

designed  to  obtain  measurements  on  one  particular  aspect  of 
relational    database    management:    the    join   operations. 
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1 .      A   Formal  Def  ir.it ion    of   a    Join 

Simply  stated,  a  join  is  a  composition  of  two  or 
more  relations.  In  relational  algebra,  a  join  can  be 
expressed  as  follows:  ^hs  8-join  of  column  x  of  table  P  and 
column  y  of  table  S  is  a  table  whose  rows  are  in  the 
Cartesian  Product  of  R  and  S,  such  that,  for  the  mathemat- 
ical operator  9,  the  row  element  x  of  R  and  the  row  element 
y   of    S   held   true   for   6. 

2 •      The   Jo  in   in    the   Benchmarked  Query   Language 

In  the  benchmarked  query  language,  RQL,  a  join  is 
accomplished  using  the  RETRIEVE,  RANGE,  and  qualifier  WHERE 
commands.      For    example: 


3FL*r ion: 


lLAiTNAME { *GF lOfcPf 

I I I 

|  BROWN              1       ^Sl.JHS 
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Given   the   above    relations,    a  -ypical    join    query    in   RQL   could 
be : 

RANGE  of   P    is   Personnel 

RANGE  of   D    is   Department 

RETRIEVE  (P. lastname ,    D.phcne, D. office) 

WHERE  P.dept    =  D.name 


This  query    wculd   return: 


IHSTNAMtl  PHI  INF  |  OFFICE  I 

I \ I 1 

IH^OWN  |  2*>i  \  141  t 
|*HITF  |  2M  I  14  1  t 
I  SMI  I H  I  2nS  |  14?  I 
I 1 I » 
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D.       EQUALITY   JOINS 

1 .  The   Definition  and    Examples 

An  equality  join  is  one  in  which  9  is  defined  as  the 
mathematical  equaltiy    (i.e.  ,      =)  .  That    is,      the   statement 

following  the  qualifier  WHERE  in  RQL  contains  either  a 
singular  cr  multiple  equalities.  For  example,  using  the 
relations  described  above,  the  following  retrievals  repre- 
sent   two    different    eguality    joins: 

RETRIEVE (P.  lastname,  D .phone) 
WHERE   P.Dept   =   D.name    and    P.  age    =    "25" 

or 

RETRIEVE (P. lastname, D. phone) 

WHERE    P.dept  =    D.name   and   D.name   ■    "OPS" 

and  P. age   =   "25" 

2 .  The    Databases   Used 

Equality  joins  represent  the  vast  majority  of  exper- 
iments conducted  during  this  research.  Equality  joins  have 
been    conducted    on  all  of    the   databases   listed   in    Table    II. 

3  •      Queries    Used 

Equality  joins  have  been  run  with  both  singular  and 
multiple  qualifications  (i.e.,  singular  or  multiple  equali- 
ties in  the  WHERE  clause) .  The  majority  of  the  joins  have 
been  conducted  on  singular  qualifications,  and  the  discus- 
sions below  focus  primarily  on  those  experiments.  The 
multiple-qualification  joins  will  be  discussed  separately. 
The  singularly-qualified  joins  are  equated  on  the  KEY  field 
of   each   relation. 
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4 .       Results 

As  previously  explained,  the  methodology  emphasizes 
varying  parameters  throughout  the  repetition  of  the  same 
group  of  experiments.  The  queries  and  their  results  are 
presented  below,  and  they  are  grouped  by  the  parameter  that 
has    been    varied. 

a.      Variability   of    Relation   Size   and   Tuple    Size 

Figure  3.1  depicts  three  joins  of  relations 
whose  tuple  size  is  of  100  bytes.  The  first  equality  join 
involves  a  relation  of  500  tuples  and  another  relation  of 
1000  tuples.  The  second  equality  join  involves  a  relation 
of  2500  tuples  and  another  of  5000  tuples.  The  third 
equality  join  involves  a  relation  of  5000  tuples  and  another 
of  10000  tuples.  It  is  clearly  evident  that  the  join  times 
increase  linearly  as  the  number  of  tuples  being  joined 
increases   linearly. 

We  now  vary  the  tuple  size  for  all  three  rela- 
tions. Thus,  we  benchmark  the  three  relations  whose  tuple 
size  is  of  200  bytes.  This  is  depicted  in  Figure  3.2.  The 
benchmark  of  the  relations  whose  tuple  size  is  of  1000  bytes 
is  depicted  in  Figure  3.3,  and  the  benchmark  of  the  rela- 
tions whose  tuple  size  is  of  2000  bytes  is  depicted  in 
Figure   3.4.  The    linearity   demonstrated   earlier      in   Figure 

3.1    is   again  evident    in   these    joins. 

Figure  3.5  is  a  compilation  of  Figures  3.1,  3.2, 
3.3,  and  3.4  in  which  the  slopes  (or  the  rates)  of  linearity 
may  be  compared.  It  is  important  to  note  that,  the  bigger 
the    tuple    size    there    is;    the   steeper    the   slope   will    be. 


23 


D 
C 


BENCHMARKED  RELATIONS  WITH  SMALL,  FIXED 
TUPLE  SIZE  OF  100  BYTES  -  DATABASE 
NPSTEST 
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Figure  3.1    Eenchmarked  Relations  -  Small  Tuple  Size. 
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BENCHMARKED  RELATIONS  WITH  MEDIUM,  FIXED 
TUPLE  SIZE  OF  200  BYTES  -  DATABASE 
NPSTEST 
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Figure  3.2    Benchmarked  Relations  -  Medium  Tuple  Size. 
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BENCHMARKED    RELATIONS    WITH   LARGE.    FIXED 
TUPLE    SIZE    OF    1000    BYTES    -    DATABASE 
NPSTEST 
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Figure   3.3        Bench marked    Relations   -    Large   Tuple    Siz«. 

31 


D 
C 


BENCHMARKED    RELATIONS    WITH    VERY   LARGE, 
FIXED    TUPLE    SIZE    OF    2000    BYTES   - 
DATABASE    NPSTEST 
50t- 
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Figure   3.4        Benchaarked   Relations  -    Very   Large   Tuple   Size. 
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Figure  3.5   Relative  Performance  -  Changes  in  Tuple  Size. 
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b.  Variability   of    Database   Size      in   Terms    cf   Number 
of    B  loc  k  s 

Figure  3.6  depicts  thrae  join  operations  over  a 
large    database      cf    3  1350      blocks,      NPSTEST.  Additionally, 

represented  in  Figure  3.6  are  the  same  three  joins  over 
three  small  databases,  NPS4  of  150  blocks,  NPS5  of  750 
blocks,      and      NPS6    of      1500    blocks.  The   results      of    these 

gueries  reveal  that  the  block  size  is  not  a  significant 
factor   in    jcin    time. 

c.  Variability  of    Database   Disk   Placement 

Every  database,  namely,  NPS1,  NPS2,  NPS3,  NPS11, 
NPS12,  or  NPS13  contains  only  two  relations.  NPS1,  NPS2, 
and  NPS3  have  been  created  on  the  same  disk.  NPS11,  NPS12, 
and  NPS13  have  been  created  separately,  each  of  which  occu- 
pies two  disks.  Figure  3.7  depicts  -he  time  for  joins  on 
NPS1,  NPS2,  and  NPS3  versus  the  same  joins  conducted  on 
NPS11,  NFS12,  and  NPS13.  The  results  strongly  suggest  that 
database  disk  placement,  especially  for  relatively  small 
databases,    is  net   a    major   factor    in    join   time. 

d.  Variability   of    Index   Structure 

A  query  stream  has  been  run  on  NPS11,  NPS12,  and 
NPS13.  During  the  run  the  index  structure  on  the  relations 
in  the  databases  has  been  modified  from  clustered  to 
nonclustered  and  then  eliminated.  Figure  3.8  depicts  the 
jcin  times  in  each  situation.  From  the  results  obtained,  it 
can  be  reasonably  assumed  that  for  relations  of  this  size 
there  is  no  significant  difference  between  join  times  on 
clustered  and  nonclustered  indices.  However,  the  join  times 
for  those  relations  vith  no  indices  have  exhibited  increases 
exponentially. 
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THE    IMPACT    OF    DATABASE    BLOCKSIZE    ON 
JOIN   TIME    -    COMPARISONS   FOR   LARGE 

.(NPSTEST)    and_SMALL(NPS4,5,6)    DATABASES 
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Figure    3.6        The    Impact  of   Database   Block  Size    on  Joins. 
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Figure  3.7   The  Impact  of  Disk  Placements  on  Joins. 
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Figure  3.8    The  Impact  of  Indices  on  Joins. 
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e.      Variability   of    Machine  Configuration 

As    stated     earlier,      daring   the  course      of    these 
experiments,        the    database      machine      being  benchmarked      has 

operated  under  three  different  conif gurations :  1 /2-megabyte 
cache  memory  without  a  database  accelerator,  2-megabyte 
cache  meacry  without  a  database  accelerator,  and  2-megabyte 
cache  mercery  with  a  database  accelerator.  Joins  over  rela- 
tions with  the  fixed  tuple  size  of  100  bytes  in  the  data- 
base, NPSTEST,  have  been  conducted  on  all  three 
configurations.  The  comparative  results  of  these  joins  are 
depicted  in  Figure  3.9.  These  results  show  that  an  increase 
in  cache  memory  size  from  1/2  to  2  megabytes  improved  join 
time  by  a  factor  of  27%  to  31$.  The  addition  of  the  data- 
base accelerator  to  the  2-megabyte  cache  improved  the  join 
time  by  a  factor  of  6%  to  12%  only.  These  results  would 
seem  to  clearly  indicate  that,  tor  the  jcin  operation,  a 
larger  cache  memory  is  much  more  effective  than  the  addition 
cf   a    database  accelerator. 

5  •      Selection    Experiments 

In  addition  tc  the  equality  joins  described  so  far, 
there  has  been  an  additional  qualification  designed  to 
select  only  a  certain  portion  of  the  joined  tuples  for 
display.  The  number  of  tuples  to  be  displayed  is  to  be  5  % 
of  the  number  of  tuples  in  the  smaller  relation  of  the  two 
relations  in  each  jcin.  To  accomplish  this  objective  for 
the  jcin  cf  the  500-tuple  relation  and  the  1000-tuple  rela- 
tion, the  additional  qualification  is  to  impose  a  "<  25" 
restriction   en    the    KEY  attribute.  That    is,      the   relations 

have  been  joined  on  the  equality  of  the  KEY  field  in  each 
relation,  and  there  has  been  the  additional  qualifier  that 
those  tuples  to  be  displayed  must  have  a  KEY  value  that  is 
less    than    "25".      For    the    join    of    the    2500-tuple    relation    and 
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Figure   3.9       The   Impact   of    Machine  Configurations    on   Joins. 
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the  5000-tuple  relation  the  restriction  is  "<  125",  and  for 
the  jcio  of  ths  5000-tuple  relation  and  the  10000-tuple 
relation   the  restriction    is    " <   250". 

Figure  3.10  depicts  the  response  times  for  these 
join  selections.  Figure  3.  11  depicts  the  response  times  for 
the  same  joins  for  which  there  is  no  5%-selection  restric- 
tion, a  comparison  of  the  results  of  each  join  reveals  that 
especially  for  the  join  of  the  larger  relations  the  differ- 
ence in  response  time  is  proportionally  greater.  These 
significant  differences  are  likely  due  to  at  least  two 
prevalent  factors.  First  of  all,  there  is  an  I/O  overhead 
that  undoubtedly  comprises  a  major  portion  of  the  differ- 
ence. Secondly,  it  is  highly  probable  that  for  this  type  of 
join  the  select  operation  is  performed  first,  and  then  the 
actual  join  is  performed.  A  comparison  of  Figures  3.10  and 
3.11    would    support    this    hypctheis. 

6  •      Cther   Equality-Join    Experiments 

Figure  3.12  depicts  a  comparison  between  two  sg+s  of 
three  joins  en  the  same  relations  with  nonclustered  indices. 
The      first    set      requires   no      relations   to      be   sorted.  The 

second  set  requires  the  relations  to  be  sorted  on  an  attri- 
bute ether  than  the  KEY  attribute  on  which  the  index  is 
based.  The  comparative  results  of  the  runs  for  these  joins 
are  close.  The  plotted  curves  for  the  response  times  cress 
themselves.  This  may  indicate  that  the  sorting  of  relations 
on  the  basis  of  a  non-key  attribute  does  not  improve  the 
join    time. 

Figure  3.13  depicts  a  comparison  between  two  sets  of 
the  same  three  joins  for  which  the  expression  of  the 
equality  predicate  has  been  reversed.  For  these  particular 
joins  the  reversal  of  the  expression  of  the  equality  predi- 
cate   appears  to    be    insignificant    as   a    factor    in    join    time. 
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Figure   3-10        Three   5%-Join    Selections. 
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Figure  3.11    Three  Joins  Without  Selection. 
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Figure  3.12    Joins  on  Sorted  and  Unsorted  Relations. 
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Figure  3.13    Joins  With  Predicates  Reversed. 
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E.  INEQOALITY    JOINS 

A   limited  number   of   inequality      joins   has   been   conducted 
during   the    course   of    these    experiments. 

1  .      A    Definition 

Fcr  these  experiments  an  inequality  join  is  ens  in 
which  9  is  defined  as  a  mathematical  inequality.  That  is, 
the  statement  following  the  qualification  WHERE  in  RQL 
contains  either  "!"  or  "<"  or  ">".  This  qualification  has 
been  imposed  on  the  KEY  attribute. 

2  •      Experiments 

Inequalities  have  been  applied  to  the  join  cf  a 
500-tuple  relation  and  a  1000-tuple  relation  and  to  the  join 
of   a    2500-tupie    relation    and   a    5000-tuple   relation. 

3  •      Disastrous    Results 

The  results  of  these  joins  have  proven  to  be  disast- 
rous. Fcr  even  the  smaller  join  of  the  500-tuple  relation 
and  the  1000-tuple  relation,  the  response  time  has  run  into 
hours.  This       long      response        time      has      jeopardized      the 

integrity  of  the  experiments,  since  during  the  course  of  the 
run  the  status  of  the  host  machine  has  experienced  signifi- 
cant fluctuations  in  load  conditions.  Obviously,  it  may 
prove  the  point  that  the  inequality  joins  cannot  be 
supported   by  the   machine   with    any   reasonable    response    time. 

F.  THE    THREE-WAY    JOIN 

1  .      A    Definition    and    Example 

Fcr  these  experiments  a  three-way  join  is  simply  a 
composition      of    ■'•hree     relations    via      equality    joins.  The 

three   relations    have    been    joined   en      the   equality   of    *he   KEY 
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attribute  of  each  relation.  That  is,  for  relations  A,  Br 
and  C,  the  join  has  been  accomplished  WHERE  A. KEY  =  E. KEY 
and    B.KEY    =    C.  KEY. 

2  •      Experiments 

The  three  relations  that  have  been  joined  are  a 
500-tuple  relation,  a  1000-tuple  relation,  and  a  25  00-4-.uple 
relation.  No  selection  restriction  has  been  imposed  or  the 
join . 

The  response  time  for  this  query  is  .8114  minutes. 
A  two-way  join,  under  similar  conditions,  cf  the  same 
500-tuple  and  1000-tuple  relations  has  been  accomplished  in 
.7011  minutes.  The  small  increase  of  the  response  time  from 
the  two-way  join  to  the  three-way  join  of  .1103  minutes 
(15.7??)  would  appear  to  further  demonstrate  the  significance 
of  the  one-time  I/O  overhead  in  joins.  In  other  words, 
regardless  of  tha  number  of  ways  a  join  is  to  be  conducted, 
the  one-tine  I/O  overhead  would  consume  a  substantial 
portion      of    the      join  time.  In   this      case,      the      overhead 

consumes    about    65%    of   the  three-way    join    time. 

G.       JOINS    VERSUS    VIE1S 

1  •      li^   1L-.1  i.2    Jki    3en  chmarkad   2uerv_   Language 

In  RQL  the  CHEATS  VIEW  command  is  used  to  set  up  a 
virtual  relator,  which  is  composed  of  attributes  cf  one  or 
mors    relations.  The  VIEW      is  not      physically    a      relation. 

Bather,      its     definition   is      stored    in      the   database.  The 

following   example   creates  a    new    virtual    relation,    LOCATOR: 

RANGE   of    P    is    Personnel 
RANGE   of    D    is    Department 
CREATE    VIEW    LOCATOR(P. name, D. name, D. of fice, D. phone) 

WHERE   ?. dept    =   d. name 
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2  •      Experiments   on  visws 

The  views  have  been  defined  and  stored  in  the  appro- 
priate databases,  before  their  use  for  comparison  to  join 
operations.  For  both  the  views  and  the  joins,  projection 
has  been  limited  to  five  attributes,  but  no  restriction  has 
been    imposed  on    selection. 

The  views  have  been  created  from  a  500-tupls  rela- 
tions and  a  1000-tuple  relation;  a  2500-tuple  relation  and  a 
5000-tuple  relation;  and  a  5000-tuple  relation  and  a 
10000-tuple    relation.  These   relations   exist      in    databases 

NPS11,  NFS12,  and  NPS13,  respectively.  Likewise,  the  joins 
have  been  accomplished  on  these  same  relations  and  data- 
bases . 

Figure  3.14  depicts  the  comparative  response  time 
for  each  of  the  three  situations.  The  remarkable  similarity 
in  response  times  between  views  and  joins  for  these  experi- 
ments would  seem  to  point  out  that  the  views  are  no  more 
expensive  and  inefficient  to  use  than  the  joins.  In  certain 
situations,  however,  the  views  could  be  of  greater  value, 
since  they  require  very  little  disk  space  as  compared  to  the 
physical  space  needed  by  the  tuples  of  the  joins. 
Additionally,  the  view  appears  to  provide  the  user  greater 
flexibility    for    controlling    access   to    the    database. 
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Figure    3.14        Joins   Versus    Views. 

48 


IV.    CONCLUDING    REMAINS 

A.       GENEBAL    COMMENTS 

The  experiments  discussed  above  have  revealed  several 
interesting  results,  notably  the  consistent  linearity  in 
join  times  and  the  apparent  significant  join  overhead 
undoubtedly     resulting      from   bus      contention.  Figure      4.1 

illustrates   both  of    these  charactersistics . 

More  specifically.  Figure  4.1  depicts  the  total  join 
time  for  various  numbers  of  blocks  of  joined  data.  The 
inherent  overhead  is  clearly  evident  for  access  to  less  than 
1000  blocks  while  these  joins  involved  with  1000  or  -ncre 
blocks  clearly  demonstrate  the  consistent  linearity  as 
previously    discussed. 

As  also  previously  discussed,  the  GETTIME  function  in 
RQL  has  been  the  only  measurement  tool  employed.  Although 
no  hardware  or  software  probes  have  been  available,  the 
experiments  that  have  been  run  using  GETTIME  have  provided 
enough  information  so  that  some  statement  concerning  the 
mean  of  attainable  block  access  time  can  be  made.  Figure 
4.2  depicts  the  average  block  access  time  for  each  tuple 
template  and  the  effects  on  this  average  as  the  join  has 
been  repeated  over  increasingly  larger  relations  (in  the 
number  of   tuples). 

In  Figure  4.2  it  is  evident  that  the  overhead  of  the 
initial  access  is  being  absorbed  as  the  size  of  the  rela- 
tions being  joined  increases.  Ey  repeating  the  same  join 
for  increases  in  both  blocks  size  and  number  of  tuples 
accessed,  some  representative  mean  access  times  can  be 
ascertained.  That  is,  the  access  time  curves  will  approach 
some    asymptotic    lower  bound. 
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Figure   4.1         Block    Access   Times. 
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Figure  4.2    Mean  Access  Times, 
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Figure  4.2  also  reveals  that  for  this  particular  data- 
base machine  that  it  is  more  efficient  (or  profitable)  to 
perform  jcirs  en  larger  relations.  The  access  times  for  the 
smaller  relations  are  much  higher  than  the  access  times  for 
the      larger      relations.  As      the        size      of      the      relation 

increases,  the  mean  access  time  demonstrates  a  convergence 
to  a  representative  number.  This  number,  the  mean  access 
time,  can  te  considered  an  important  charactersitic  of  this 
particular   benchmarking   experiment. 

B.       A  COMPARISON  OF  DIFFERENT  ACCELERATOR/CACHE 

CONFIGURATIONS 

This  benchmarking  experiment  has  not  been  designed  as  an 
analysis  of  several  differently  configured  RDM  1100s. 
However,  while  this  benchmarking  is  making  progress,  the 
availability  of  more  cache  and  the  database  accelerator  has 
stimulated  much  interest  in  the  performance  differences  for 
the  different  machine  configurations.  Therefore,  consider- 
able time  has  been  expended  towards  accumulating  comparable 
data  for  each  of  the  three  configurations  on  which  experi- 
ments  have    been    run. 

In  Chapter  III  there  is  a  brief  discussion  of  the 
differences  in  join  times  for  the  relations  of  100-byte 
tuples.  The  following  discussion  focuses  en  the  24  joins 
conducted  on  the  database,  NPSTEST,  for  each  of  the  +hree 
cc  n  f  i  gu  r  a  t  i  o  ns  . 

Table  III  summarizes  the  average  percentage  decrease  in 
join  time  for  each  join  as  the  amount  of  cache  is  increased 
from  1/2  megabyte  to  2  meagbytes.  Table  III  also  summarizes 
the  further  decrease  in  join  time  as  the  database  acceler- 
ator is  added  to  the  2-megabyte  cache  configuration.  This 
summary  rsveals  larger  decreases  in  the  join  time  as  the 
sizes    cf      the   relations      being    joined      increase.         In      ether 
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words,  as  the  initial  join  overhead  is  absorbed,  the  addi- 
tional cache  increasingly  decreases  the  join  time  by  a 
percentage      of      approximately     593.  Correspondingly,        it 

appears  that  the  effects  of  adding  a  database  accelerator  to 
the  2-megabyte  cache  are  less  significant  for  the  larger 
relations,    although    in  all    cases   there   is    some  improvement, 

C.       THE    METHODOLOGY     AND    ITS    LIMITATIONS 

The  methodology  that  has  teen  discussed  in  this  paper 
has  fundamentally  sound  origins,  and  the  experimental 
approach  of  varying  join  parameters  has  and  should  continue 
to  provide  relevant  information  from  which  insight  can  be 
drawn.  However,  as  discussed  above,  benchmarking  is  a  rela- 
tively new  area  of  research  in  computer  science,  and 
certainly  the  techniques  that  have  been  applied  throughout 
the    ccurse    of  these    experiments   can    be   improved   and   refined. 

A  definitive  performance  pronouncement  on  the  RDM  1100 
has  net  been  the  ultimate  goal  due  to  the  use  of  the  GETTIM"? 
function      of      RQL.  Despite    its      "coarseness"      in      getting 

performance  measurements,  the  GETTIME  function  has  been 
deemed  accurate  enough  for  the  purposes  of  our  experiments. 
Actually,  this  function  has  been  considered  sufficiently 
accurate  in  view  of  the  lack  of  other  more  accurate  measure- 
ment tools.  Probes  have  not  been  available,  and  software 
packages  for  performance  data  collection  have  been  delayed 
and  are  unavailable  for  these  experiments.  Future  attempts 
to  benchmark  such  a  system  should  utilize  additional  methods 
for    determining    relevant    performance    data. 

The  benchmarking  cf  the  RDM  1100  is  a  project  of  seem- 
ingly low  priority  at  the  command  which  houses  the  host 
ONIVAC  system.  Existing  workloads  demand  vast  amount  of  the 
system's  resources,  and  in  reality  it  has  been  quite  diffi- 
cult   tc    "control"    the  environment   in    which    these    experiments 
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have  fcesn  conducted.  Thus  the  load  conditions  of  the  host 
system  may  have  compromised  the  integrity  of  some  results. 
Additionally,  the  majority  of  the  experiments  have  been 
conducted  from  a  remote  terminal  which  has  probably  further 
degraded   the     experimental    results.  Obviously,      on      site, 

strictly  controlled  experimentation  is  the  ideal  practice 
for    benchmarking   experiments. 

Our  inability  to  control  the  host  environment  raises  yet 
another    issue.  The   goal      of   the      experiments   has      been   to 

collect  measurements  on  joins  for  which  certain  parameters 
are  varied.  However,  a  major  parameter  has  not  been  varied. 
That      parameter    is      the     load   condition      of      the    host.  As 

described  above,  attempts  have  been  made  to  run  experiments 
at  times  cf  minimal  hest  activity.  In  actual  practice,  the 
database  machine  is  likely  to  be  benchmarked  during  periods 
of   peak   host  activity.  Future    benchmarking   efforts    should 

take  this  into  consideration,  and  attempts  should  be  made  to 
contrcl  and  vary  host  load  conditions  as  part  of  the  mix  of 
query  scripts.  In  view  of  the  minimal  host  activity,  the 
results  we  have  obtained  may  be  considered  as  the  optimal 
performance    cf    the    REM   1100    for    join    operations. 

As  the  deadline  fcr  submission  of  this  thesis  has  drawn 
near,  planned  experiments  have  been  cancelled  from  the 
testing  agenda.  A  "time  crunch"  has  resulted  from  a  variety 
cf  sources.  Primary  of  these  sources  has  been  the  contin- 
uing requirements  to  correct  software  deficiencies  that  have 
been  identified  as  a  result  of  the  experiments  that  have 
been    conducted.  Likewise,      the      changing   of      the    database 

machine  configuration  has  also  severely  cut  into  the  time 
available  to  run  the  full  set  of  planned  experiments.  In 
essence,  although  a  great  deal  of  relevant  data  has  been 
collected,  the  consistency  of  some  data  may  be  questionable 
since  a  limited  number  of  experiments  has  been  conducted  in 
each    araa   of  experimentation. 
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Besides  these  limitations  and  deficiencies,  the  experi- 
ments that  have  been  conducted  have  provided  enough  relevant 
informaticn  from  which  valuable  conclusions  can  be  drawn. 
The  results  of  the  join  experiments  described  here,  when 
combined  with  those  results  of  selection  and  projection 
experiments,  comprise  a  substantial  starting  point  for  the 
comparison  of  similar  database  machine  architectures.  They 
provide  a  solid  framework  for  benchmarking  relational  data- 
base   machines. 
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